AITopics | exploration policy

Collaborating Authors

exploration policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Context Shift Reduction for Offline Meta-Reinforcement Learning Y unkai Gao

Neural Information Processing SystemsFeb-18-2026, 03:21:14 GMT

Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks.

machine learning, reinforcement learning, task representation, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL Andrew Wagenmaker

Neural Information Processing SystemsFeb-16-2026, 14:41:25 GMT

Such direct sim2real transfer is not guaranteed to succeed, however, and in cases where it fails, it is unclear how to best utilize the simulator. In this work, we show that in many regimes, while direct sim2real transfer may fail, we can utilize the simulator to learn a set of exploratory policies which enable efficient exploration in the real world.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Alameda County > Livermore (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.45)

Add feedback

abe8e03e3ac71c2ec3bfb0de042638d8-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 14:38:16 GMT

c-bet, egocentric view, panoramic view, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

abe8e03e3ac71c2ec3bfb0de042638d8-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 14:38:12 GMT

agent, exploration, exploration policy, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

6d0c932802f6953f70eb20931645fa40-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 06:22:53 GMT

agent, exploration policy, perception model, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Industry:

Education (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

15825aee15eb335cc13f9b559f166ee8-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 14:15:16 GMT

agent, exploration, interaction, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Add feedback

Context Shift Reduction for Offline Meta-Reinforcement Learning

Neural Information Processing SystemsDec-27-2025, 07:05:07 GMT

Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min mutual information representation learning mechanism to diminish the impact of the behavior policy on task representation. In the meta-test phase, we introduce the non-prior context collection strategy to reduce the effect of the exploration policy. Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains.

context shift reduction, name change, offline meta-reinforcement learning, (9 more...)

Neural Information Processing Systems

Genre: Research Report (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

Neural Information Processing SystemsDec-24-2025, 06:31:57 GMT

In this paper, we explore how we can build upon the data and models of Internet images and use them to adapt to robot vision without requiring any extra labels. We present a framework called Self-supervised Embodied Active Learning (SEAL). It utilizes perception models trained on internet images to learn an active exploration policy. The observations gathered by this exploration policy are labelled using 3D consistency and used to improve the perception model. We build and utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner. The semantic map is used to compute an intrinsic motivation reward for training the exploration policy and for labelling the agent observations using spatio-temporal 3D consistency and label propagation. We demonstrate that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model by just moving around in training environments and the improved perception model can be used to improve Object Goal Navigation.

name change, perception model, self-supervised embodied active learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.60)
Information Technology > Artificial Intelligence > Robots (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Active Exploration for Learning Symbolic Representations

Garrett Andersen, George Konidaris

Neural Information Processing SystemsNov-21-2025, 14:02:20 GMT

It is therefore possible to learn the symbols by naively collecting samples from the environment, for example by random exploration.

artificial intelligence, machine learning, planning & scheduling, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention

Calzolari, Gabriele, Sumathy, Vidya, Kanellakis, Christoforos, Nikolakopoulos, George

arXiv.org Artificial IntelligenceNov-20-2025

Autonomous exploration of obstacle-rich spaces requires strategies that ensure efficiency while guaranteeing safety against collisions with obstacles. This paper investigates a novel platform-agnostic reinforcement learning framework that integrates a graph neural network-based policy for next-waypoint selection, with a safety filter ensuring safe mobility. Specifically, the neural network is trained using reinforcement learning through the Proximal Policy Optimization (PPO) algorithm to maximize exploration efficiency while minimizing safety filter interventions. Henceforth, when the policy proposes an infeasible action, the safety filter overrides it with the closest feasible alternative, ensuring consistent system behavior. In addition, this paper introduces a reward function shaped by a potential field that accounts for both the agent's proximity to unexplored regions and the expected information gain from reaching them. The proposed framework combines the adaptability of reinforcement learning-based exploration policies with the reliability provided by explicit safety mechanisms. This feature plays a key role in enabling the deployment of learning-based policies on robotic platforms operating in real-world environments. Extensive evaluations in both simulations and experiments performed in a lab environment demonstrate that the approach achieves efficient and safe exploration in cluttered spaces.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2511.15358

Country: Europe > Sweden (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: